Mon. Not. R. Astron. Soc. 000, [THM] (2009) Printed 17 December 2010 (MN I^TeX style file v2. 2) 



Subtraction of point sources from interferometric radio 
images through an algebraic forward modeling scheme 

o : 

g G. Bernardi^*, D.A. Mitchell^ S.M. Ord^ L.J. Greenhill\ B. Pindor^, R.B. Wayth^ and 
(N J.S.B. Wyithe^ 

^ ' ^Harvard-Smithsonian Center for Astrophysics, Garden Street 60, Cambridge, MA, 02138 
'^University of Melbourne, School of Physics, Parkville 3010, Australia 

^ ICRAR/Curtin Institute of Radioastronomy, GPO Box U1987, Perth, WA6845, School of Physics, Parkville 3010, Australia 
Accepted xxxx. Received yyyy; in original form zzzz 

ABSTRACT 

We present a method for subtracting point sources from interferometric radio images 
via forward modeling of the instrument response and involving an algebraic nonlinear 
minimization. The method is applied to simulated maps of the Murchison Wide-field 
Array but is generally useful in cases where only image data are available. 

After source subtraction, the residual maps have no statistical difference to the 
expected thermal noise distribution at all angular scales, indicating high effectiveness 
in the subtraction. Simulations indicate that the errors in recovering the source pa- 
rameters decrease with increasing signal-to-noise ratio, which is consistent with the 
theoretical measurement errors. 

In applying the technique to simulated snapshot observations with the Murchison 
Wide-field Array, we found that all 101 sources present in the simulation were recovered 
with an average position error of 10 arcsec and an average fiux density error of 0.15%. 
This led to a dynamic range increase of approximately 3 orders of magnitude. Since 
all the sources were deconvolved jointly, the subtraction was not limited by source 
sidelobes but by thermal noise. 

This technique is a promising deconvolution method for upcoming radio arrays 
with a huge number of elements, and a candidate for the difficult task of subtracting 
foreground sources from observations of the 21 cm neutral Hydrogen signal from the 
epoch of reionization. 
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1 INTRODUCTION 

The deconvolution of radio point sources is a problem that 
has been studied for several decades in radio astronomy. 

When calibration errors can be neglected, the problem 
of subtracting point sources from deconvolved radio images 
ultimately reduces to a problem of fitting their positions 
and flux densities as accurately as the instrumental noise 
permits. 

The methods used to deconvolve point source sidelobes 
are typically based on the CLEAN algorithm (Hogbom 1974; 
Clark 1980). The CLEAN algorithm looks for the brigthtest 
pixel in the image and subtracts a fraction of the dirty beam 
from the image at that location, forming a residual image. 
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The search and subtraction loop is repeated until the side- 
lobes are reduced below the thermal noise level. 

The model components that are found through this it- 
erative process can be convolved with a two dimensional 
Gaussian and introduced back into the residual image. The 
best estimate of flux density and position for each source 
is then found by fitting a two dimensional Gaussian to the 
source. 

The subtraction of point sources performed in this way 
has the known problem that the dynamic range achievable is 
limited by pixelization effects, i.e. by the fact that data are 
averaged and arranged into a regular grid. Therefore even a 
simple point source that does not lie at the centre of the grid 
cell cannot be represented by a single delta function model, 
but requires a potentially infinite number of components to 
be fully represented (Briggs & Cornwell 1992, Perley 1999). 

In presence of the visibility data, the pixelization prob- 
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lem can be minimized and the dynamic range improved by 
subtraction of sources from the ungridded visibilities (Noor- 
dam & de Bruyn 1982; Voronkov & Wieringa 2004) and by 
centering the local pixel grid on the source to be deconvolved 
(Cotton & Uson 2008). 

When the number of antenna elements to be correlated 
becomes extremely large, however, it becomes harder and 
harder to store the visibility data and the deconvolution has 
to be performed on images with, again, a limitation of the 
dynamic range due to pixelization effects. 

This is a relevant issue for upcoming radio telescopes 
like the Murchison Wide-field Array (MWA, Lonsdale et al. 
2009) or future instrumentation like the SKAQ since they 
will produce a huge number of correlated visibilities. MWA 
will generate data at such a rate (approximately a PByte 
per day) that will be impractical to store the raw visibili- 
ties and go through the traditional selfcalibration loop, and 
the deconvolution of radio sources will happen in the image 
plane. 

The deconvolution of bright point sources is also a 
prominent issue in the view of the detection of the epoch of 
reionization (EoR) through the redshifted 21 cm line emis- 
sion, which is one of the main goals of the MWA. 

The problem of foreground subtraction for EoR experi- 
ments has been studied by various authors in the literature 
(Di Matteo, Ciardi & Miniati 2004; Morales & Hewitt 2004; 
Santos, Cooray & Knox 2005; Morales, Bowman & Hewitt 
2006; Wang et al. 2006, McQuinn et al. 2006; Gleser et al. 
2008; Jehc et al. 2008; Bowman, Morales & Hewitt 2009; Liu 
et al. 2009a; Barker et al. 2009; Liu et al 2009b; Barker et 
al. 2010). Most of their efforts have been devoted to demon- 
strations that the diffuse Galactic synchrotron radiation and 
the classical confusion noise due to unresolved radio sources 
can be subtracted if it is assumed that they are spectrally 
smooth and absent of calibration errors. Recent observations 
(Ah, Bharadwaj & Chengalur 2008; Bernardi et al. 2009; Pen 
et al. 2009; Parsons et al. 2010; Bernardi et al. 2010) have 
started to characterize the diffuse foreground component. 

All the simulations conducted so far, however, have as- 
sumed that the brightest point sources were perfectly sub- 
tracted from the data. Bowman et al. (2009) and Liu et al. 
(2009b) indicated that point sources should be subtracted 
down to a 10-100 mjy threshold in order to detect the EoR. 

Datta, Bhatnagar & Carilli (2009) and Datta, Bowman 
& Carilli (2010) studied the problem of subtraction of bright 
sources in the presence of calibration errors and concluded 
that sources brighter than 1 Jy should be subtracted with 
a positional precision better than 0.1 arcsec if calibration 
errors remain correlated over ~6 hours of observation. If the 
errors are correlated on a shorter time length, however, they 
will tend to average down with time and the requirement for 
positional accuracy will be less stringent. 

Pindor et al. (2010) developed a technique based on 
matched filters to subtract bright point sources in MWA 
images in presence of diffuse emission. They showed that 
the dynamic range of the residual images can be improved 
by a factor of ~2-3 in this way. 

In this paper we present a method of subtracting point 
sources from MWA dirty images that involves forward mod- 
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eling and a nonlinear minimization scheme. Forward model- 
ing is a general concept that can be used to extract astro- 
physical parameters from the data. 

We applied our method to simulated MWA images to 
show that point sources can be deconvolved with an accu- 
racy limited by thermal noise even without storing the visi- 
bility data. 

The paper is organized as follows: in Section [2] we 
present the method, in Section [3] we apply the method to 
MWA simulated images and we conclude in Section |4] 



2 THE METHOD 

The method presented here relies on the fact that the sky 
emission can be forward modeled. Forward modeling is a 
generative model, i.e., a model that is related to the astro- 
physical parameters to be measured, is based on physical 
assumptions, and can be generated a priori, independent 
of the actual data. Once the forward model is determined, 
a minimization scheme (generally nonlinear) can be imple- 
mented to fit for the astrophysical parameters of interest. 

Forward modeling has already been used in several as- 
trophysical contexts; for example. Bailer- Jones (2010) used 
a forward modeling algorithm to estimate stellar parame- 
ters from optical spectra. Forward modeling finds a natural 
application in point source subtraction from radio images 
where visibility data are not accessible anymore. 

In this case, the forward model does not need to be ap- 
proximated by any analytical function but it is simply the 
synthesized beam calculated at that particular position in 
the sky and scaled for the source flux density. In traditional 
radio astronomy, the synthesized beam can be considered 
to remain constant throughout the whole field of view. If we 
consider the future arrays which will operate at low frequen- 
cies, however, the synthesized beam changes as a function 
of position in the map due to wide field effects and direction 
dependent primary beams. If very high dynamic range imag- 
ing is required - as it is to detect the EoR signal -, the exact 
synthesized beam should be computed at each location in 
the map without relying on any analytical approximation. 
For the MWA, real-time calibration data will be stored in 
a database and will be used to generate an accurate set of 
visibilities for each point source of interest. These visibili- 
ties can then be imaged and averaged in the same way that 
the true visibilities were imaged and averaged, resulting in 
a synthesized beam map for each source. 

In the case of point source deconvolution, the astrophys- 
ical parameteres that have to be determined via forward 
modeling are the position and flux density of each point 
source. 

For a single point-source case, our algorithm can be 
described as follows. The image pixels are grouped into an 
A'^-element vector y, where A'^ is the number of pixels in the 
map. Right ascension, declination and flux density of the 
source - i.e, the parameters to be fitted - are grouped into 
a three-element vector x. The forward model m(x) is also a 
A'^-element vector. 

The n-th iteration of the method is described as follows: 

(i) generate the forward model (i.e, an image of the syn- 
thesized beam) m„ = in(x„), for the current parameter 
estimate: right ascension, declination and flux density; 
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Select catalogue srcs 
in region of interest 
to generate sky model 



Select a subset of 
visible sources 
from image data 



Calculate forward model for 
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estimates (position, flux density) 
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^re there unmodeled sources! 
Done 

Figure 1. Flow chart of the source subtraction scheme. 

(ii) compute the x 3 Jacobian matrix, J, which con- 
tains the derivatives of the forward model-synthesized beam 
with respect to the parameters computed at the current pa- 
rameter estimate x„, 

c*mi 

(iii) estimate the difference between the data and the 
model 

Am = y - mn; 

(iv) estimate the shift in each parameter which is the so- 
lution of the linear system of equations 



(J^J)Ax = J^Am 

Ax = (j'^J)~'j'^Am; (1) 

(v) compute the new estimate of the parameters 



x„+i = x„ - Ax; 

Steps (i)-(v) are repeated until convergence is reached (Fig- 
ure [T]). 

Equation [l] shows that the problem of source subtrac- 
tion has become a nonlinear least squares minimization. 

The forward model has a linear dependence on flux den- 
sity, but nonlinear on position, therefore the partial deriva- 
tives with respect to right ascension and declination are com- 
puted numerically using finite difference approximation. 

Practically, the partial derivative with respect to right 
ascension is computed by generating an image of the synthe- 
sized beam with a small right ascension offset from the cur- 
rent estimated position. An image of the synthesized beam 
with a small declination offset is generated to compute the 
partial derivative with respect to declination. The derivative 
with respect to flux density is just a scaled version of the 
synthesized beam. 



The generalization of the single point-source case to M 
sources is straightforward, since the vector of parameters 
becomes a 3M vector, and the Jacobian matrix becomes a 
A'' X 3M matrix, and all the sources are fitted simultane- 
ously at each iteration. It is also important to note that the 
matrix j'^J that has to be inverted does not depend upon 
the number of pixels in the map, but only upon the number 
of parameters, therefore its size increases linearly only with 
the number of sources to be subtracted. 

In principle, an initial estimate of the parameters could 
be obtained by generating a grid of likely models, with a 
range of right ascension, declination and flux densities for 
each source, and selecting the model, xo, that best fits the 
data (i.e., minimizes (y — m)'^(y — m)). In practice it is 
easier and faster to fit an elliptical Gaussian to the source 
position and use its best-fit parameters as the initial guess. 

Equation [T] can be generalized by assigning a weight to 
each pixel of the image. In this case it becomes the general 
expression for nonlinear weighted least squares: 



Ax = (J'^WJ) ^j'^WAm, 



(2) 



where W is the N x N weight matrix. Although different 
weighting schemes could be explored, in the following appli- 
cations of our method we will assume that is a diagonal 
matrix with each diagonal element equal to the signal-to- 
noise ratio (SNR) of the corresponding pixel. 

The advantage of this method compared to other image 
based deconvolution techniques is that the forward model 
can be generated with an arbitrary level of precision in the 
parameter space grid and, therefore, is not affected by any 
pixelization effect. In the following section we will apply this 
method to simulated MWA images. 



3 APPLICATIONS 

In this section we test the method with simulated MWA im- 
ages obtained through the Real Time System (RTS, Mitchell 
et al. 2008). The main RTS data product will be dirty im- 
ages - i.e., images were the synthesized beam has not been 
deconvolved - integrated over a period that can range from 
8 seconds to a few minutes. It is these integrated images 
that require subtraction of point sources. The RTS will 
also save calibration information (primary beam and atmo- 
spheric models) to facilitate accurate off-line deconvolution. 



3.1 Simulation setup 

We simulated a realistic MWA observation, with 20° x 20° 
images covering the MWA field of view. The simulations 
were constructed as follows. We populated the field of view 
with point sources according to the following log A''-log S 
distribution: 

dN = NoS-'^-'^dS, 

where dN is the differential source count, A'o is the number 
of sources per steradian per Jy~^'^ and S is the source flux 
density. We have chosen A'^o in such a way that there are 
100 sources greater than 1 Jy in a 20° x 20° field. Random 
positions were assigned to the sources with no constraint on 
the minimum distance among them. 

Visibility data were then created for the sources at 
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150 MHz, with a 2 sec cadence and over a 40 kHz chan- 
nel width using the MAPS package (Wayth et al. 2010). 

Random noise was added to each ij visibility, for each 
polarization, according to the following expression: 



_ SEFD 



(3) 



where we have assumed a System Equivalent Flux Density 
SEFD^IOOOO Jy, At=2 sec and A/=40 kHz. 

The visibilities were imaged through the RTS, which 
performs calibration and imaging of raw visibility data. 
However, since we assumed a perfect calibration in this work, 
only the imaging part was used. The RTS imaging pipeline 
is described by Ord et al. (2010) and we briefly summarize 
it here, referring the reader to the paper for a more detailed 
presentation. 

Each MWA tile is constituted of 16 dipoles arranged 
in a 4 X 4 square configuration. The dipoles are fixed to 
the ground, therefore their projection on the sky changes 
with time. The primary beam response to the sky brightness 
is, therefore, time variable. We have assumed that the tile 
primary beams can be described by the sum of 16 complex 
numbers that represent gain terms for the individual, known, 
dipole beams. Since we are not dealing with calibration, all 
tile beams are assumed to have the same shape, amplitude 
and phase. 

The RTS expects visibility data from the correlator to 
be integrated over 2 seconds and 40 kHz. The visibility data 
are then averaged and imaged over 8 seconds (Mitchell et 
al. 2008). Each individual 8 second snapshot is then resam- 
pled into the Healpix frame (Gorski et al. 2005) and inte- 
grated over time, with wide-field distortions corrected dur- 
ing the resampling. The time integration is perfomed, for 
each pixel, by summing over the measured values weighted 
by the complex conjugate of their primary beam response 
(the total weight is now the square of the beam). The sum of 
the weights - i.e., the square of the primary beam response 
integrated over the duration of the observation - is divided 
out at the end of the integration. 

We generated images centred at 4'' right ascension and 
—30° declination, which is one of the potential fields for EoR 
observations. We have assumed that the field was observed 
one hour before transit. 

We simulated two different sets of observations related 
to two different array configurations. First, we considered 
the 5% protoype of the array that is currently deployed on 
the ground and is constituted by 32 tiles (32T). Second, 
we considered the full MWA configuration which will con- 
sist of 512 tiles (512T). The 32T system has less sensitivity 
compared to the 512T system and a coarser angular resolu- 
tion since its longest baseline is ~400 m whereas the longest 
baseline is ~1500 m in the 512T configuration. The instan- 
taneous uv coverage of the 32T is also much worse than the 
512T one (Figure [2]). 

The presence of wide-field effects makes the synthesized 
beam position dependent even in the absence of calibration 
errors (Figure [3|. Since we are aiming at achieving a high 
dynamic range subtraction, we will generate the synthesized 
beam for each source at the specific source location to ac- 
count for the difference. 




Figure 2. Simulated 32T uv coverage integrated over 10 minutes 
(top), and simulated 512T uv coverage for an 8 seconds snapshot 
(bottom). The image centre is at 4*^ right ascension and —30° 
declination. 
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Figure 3. An example of the difference between synthesized 
beams at two positions in the 512T simulated image. Solid line; 
the synthesized beam profile at the image centre. Dashed line: 
the synthesized beam profile 9° away from the image centre. Dot- 
dashed line: the difi'erence between the two profiles. In this exam- 
ple the difi'erence is at the 10% level. 
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3.2 32T Results 

We used the 32T simulations to test the apphcabihty of 
our method to integrated snapshot images. Since the long 
integration images taken with MWA will be obtained by co- 
adding individual snapshots - whose duration can vary from 
8 seconds to ~5 minutes - it is relevant to test the capability 
of the algorithm to subtract sources over co-added images, 
where fainter sources can become visible because of sidelobe 
suppression and the lowering of thermal noise. 

We used a 10 min integrated image, which has an rms 
thermal noise of ~52 mjy beam~^, enabling detection of 
sources brighter than ~300 mJy. However, since the compu- 
tational load increases with the number of sources and the 
length of the observation, we limit ourselves to the sixteen 
sources brighter than 4 Jy, working in a case of high SNR. 

The subtraction was performed without any a priori as- 
sumption about the sky model, that is without identifying 
in advance sources via catalogued coordinates. This enables 
us to test the robusteness of the algorithm under realistic 
conditions (i.e., without presupposition of a sky model gen- 
erated by the MWA), where sidelobe structure from sources 
around the sky cannot be filtered. 

For a sixteen source model that includes thermal noise, 
only one source (97 Jy) is clearly visible because its side- 
lobes are bright enough to cover all the remaining sources 
(Figure U). The initial guess regarding the sky brightness 
distribution is limited to the parameters for this one source. 
The subtraction was performed according to the following 
steps: 

(i) the first source parameters were estimated through 
the forward modeling minimization (three iterations, Fig- 
ure [5]); the source model was subtracted, and initial guesses 
obtained for the three newly visible sources; 

(ii) the four brightest sources were included in the sky 
model and subtracted. After three iterations another seven 
sources were detected in the image (Figure [6]) and initial 
estimates of their parameters were made; 

(iii) a sky model made of eleven sources was subtracted. 
After three iterations all the remaining sources were iden- 
tified (Figure [7|) and an initial estimate of their parameters 
performed; 

(iv) the full sky model is minimized and subtracted 
jointly, giving the residual image of Figure [8] 

In order to characterize the statistics of the residuals 
and the accuracy of the subtraction, we compare the true 
fiux densities and positions to the final estimates and to the 
theoretical measurement errors fffjf^'^^^ computed as: 

RA,DEC _ Qh 
^theor - 2 SNR 

(4) 

where O;, ~ 18 arcmin is the synthesized beam (Figure |9]). 

We observe that the error distribution narrows with in- 
creasing flux density and is within the theoretical values. No 
systematic offsets appear in the recovered source parame- 
ters, based on estimates of median and rms values (Table [T]). 

We also computed the angular power spectrum of the 




Figure 4. Simulated image with the 32T uv coverage integrated 
over 10 minutes. The black and white scale runs hnearly between 
-10 and 50 Jy beam"'^. The sidelobes of the dominant source 
(~97 Jy) obliterate the other fifteen sources between it and a 
4 Jy floor. 




Figure 5. Residual image with the brightest source subtracted, 
after three iterations. The black and white scale runs between -10 
and 15 Jy beam~^. Three new sources are visible. 

residual images as (Seljak 1997, Bernardi et al. 2009): 

^^ = ^E^w^*w (5) 
1 

where t = ^ is the usual multipole value, Q is the angular 
scale in degrees, is the solid angle in radians, A^^ is the 



6 G. Bernardi et al. 




Figure 6. Residual image with the four brightest sources sub- 
tracted, after three iterations. The black and white scale runs 
between -2 and 5 Jy beam~^. Seven new sources are visible. 




Figure 7. Residual image with the eleven brightest sources sub- 
tracted, after three iterations. The black and white scale runs 
between -1 and 3 Jy beam~^. Five new sources are visible. 

Table 1 . Median and rms values of the offset between the model 
and the fitted parameters for the simulated 32T image. 

Parameter Median rms 

RA -0.7 arcsec 9.6 arcsec 

DEC -0.3 arcsec 6.6 arcsec 

flux density 0.3% 1.2% 




Figure 8. Residual image with all the sources subtracted, after 
three iterations. The black and white scale runs between -0.4 and 
0.4 Jy beam~^. The image shows only thermal noise. 



number of Fourier modes around a certain I value, X and 
X* are the Fourier transform of the image and its complex 
conjugate respectively and 1 is the two dimensional coordi- 
nate in Fourier space. The power spectrum has a bin width 
of = 50. 

The amplitude of the power spectrum of the residual 
images decreases by more than two orders of magnitude as 
the number of subtracted sources increases (Figure [10)) . 

Figure [10] also shows the noise power spectrum, esti- 
mated as the averaged power spectrum of 100 noise real- 
izations. Each noise realization was generated by imaging 
visibilities which included only noise, following Equation [3] 
A noise power spectrum was computed from each image. 
The estimated noise power spectrum was determined as the 
average among 100 power spectrum realizations. The error 
bars are the standard deviation of the 100 power spectrum 
realizations in each multipole bin. 

It can be seen that the power spectrum of the resid- 
ual image, after the full 16-source sky model is subtracted, 
agrees with the estimated thermal noise over the entire range 
of angular scales probed. This indicates that no systematic 
errors or statistical deviations from Gaussian distributed 
noise are introduced by the method and that the source 
subtraction is accurate down to the thermal noise level. 



3.3 512T Results 

The 512T simulation included 101 sources brighter than 
1 Jy, observed in an 8 second snapshot and in a 40 kHz 
channel (Figure [TT]) . The thermal noise in the 512T image 
is ~26 mjy beam~^. Unlike the 32T case, the 512T image 
prior to forward modeling already exhibits a great number 
of sources, due to the reduced sidelobes of the synthesized 
beam. 
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Figure 10. Power spectra of residual images for an increasing 
number of subtracted sources. From top to bottom; after sub- 
tracting one source (Figure[5ll, after subtracting four sources (Fig- 
ure[6)l, after subtracting eleven sources (Figure[7ll, after the whole 
sky model is subtracted (Figure (Sjl. The error bars are at the la 
confidence level. The dashed line represents the noise power spec- 
trum (see text for details). 
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Figure 9. Errors in the fitted parameters for sources in the 32T 
simulated image; right ascension (top), declination (middle) and 
fiux density (bottom). Solid lines in the two upper plots indicate 
the envelope of the theoretical measurement errors. 




Figure 11. Simulated 8 sec snapshot image with 512T uv cover- 
age. The black and white scale runs between -1 and 2 Jy beam~^. 
The very good synthesized beam has low sidelobe levels and 
makes most of the sources directly visibilc without any subtrac- 
tion. 



The subtraction was performed according to the follow- 
ing steps: 

(i) the brightest fifteen sources were identified and an 
initial guess of their parameters estimated. They were 
then subtracted out through the minimization scheme (Fig- 
ure [HI); 

(ii) another 35 sources were identified and their parame- 



ters estimated. The joint fit is now performed on 50 sources 
simultaneously (Figure [13]) ; 

(iii) all the sources were included in the sky model. The 
minimization was carried out for all the 101 simultane- 
ously and convergence was reached after five iterations (Fig- 
ure [Til); 

The final residual image after the whole sky model is 
subtracted is consistent with the initial thermal noise level, 
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Figure 12. Residual image witii tiie brightest fifteen sources sub- 
tracted, after five iterations. Tlie black and white scale runs be- 
tween -1 and 2 Jy beam~^. 




Figure 13. Residual image with the 50 brightest sources sub- 
tracted, after five iterations. The black and white scale runs be- 
tween -1 and 2 Jy beam~^. The presence of negative peaks is due 
to a subtraction in the absence of a full sky model. 



indicating an accurate subtraction of the sources. It is worth 
noticing that in the intermediate steps, when only a partial 
sky model is subtracted, residual features due to an imper- 
fect subtraction exist, and appear as positive adjacent to 
negative peaks. 

We computed the difference between the true flux den- 




Figure 14. Residual image with all sources subtracted, after 
five iterations. The black and white scale runs between -0.3 and 
0.3 Jy beam~^. Only thermal noise is left after the subtraction 
of the whole sky model. 



Table 2. Median and rms values of the offset between the model 
and the fitted parameters for the simulated 512T image. 

Parameter Median rms 

RA 0.99 arcsec 17 arcsec 

DEC 0.56 arcsec 11 arcsec 

flux density -0.15% 4% 

sity and position values and their final estimates (Figure fTS)) . 
as was done for the 32T simulation. 

As in the 32T case, errors increase with decreasing flux 
densities and are well matched to their expected theoretical 
limits. The median and rms values show no systematic errors 
in the recovered parameters (Table 

The power spectrum of the residual image after sub- 
tracting all 101 sources agrees with the expected thermal 
noise within la error for each multipole value, indicating 
that there is no significant statistical leftover from source 
subtraction (Figure I16p . The power spectra spans almost 
three orders of magnitude because the 512T simulation 
probes the logA''-logS' at lower fiux densities. This demon- 
strates that the algorithm is able to simultaneously remove a 
large number of sources which span two orders of magnitude 
in flux density. 

This is a relevant result in the Ught of EoR measure- 
ments, where a high accuracy in source subtraction is re- 
quired to achieve the necessary dynamic range. The detec- 
tion of the EoR is believed to require this high accuracy 
in foreground subtraction because the cosmological signal is 
5-6 orders of magnitude below the strongest sources in the 
sky. Due to the time constraints that come with the real time 
nature of the MWA, subtracting sources in the visibility do- 
main - "peeling" - is only practical for the brightest sources 
which are also required to accurately constrain antenna pri- 
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Figure 15. Same as Figure[9l but for the 512T simulation. Tlio 
synthesized beam is now ^ 11.8 arcmin. The solid line in the 
bottom figure indicates the SNR~^ envelope. 



mary beam models. In order to achieve accurate calibration 
and subtraction using these sources, a comprehensive global 
sky model is required. This will be obtained by surveying 
the sky in the first months of operation with the full array. 
At the same time, the actual tile beams will be measured 
and used to improve the beam models. The knowledge of 
the sky and the beams can be improved in a bootstrapping 
fashion by repeating the sky survey. 

We expect that the initial 10^ — 10® dynamic range can 
be alleviated by 2-3 orders of magnitude through a very 
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Figure 16. Power spectra of residual images for an increasing 
number of subtracted sources. From top to bottom: initial image 
without subtracting any source (Figure [TTJ, after subtracting fif- 
teen sources (Figure[l2]l, after subtracting 50 sources (Figure [T3)l . 
after the whole sky model is subtracted (Figure I14II . The error 
bars are at the Icr confidence level. The dashed line represents 
the noise power spectrum. 



precise peeling procedure. A further subtraction of the re- 
maining bright sources is required in the integrated images. 

If the dynamic range is expressed as the ratio between 
the brightest source in the map and the noise rms, the source 
subtraction in the 512T case achieves a dynamic range of 
~3400 through our minimimization scheme. 

It is also interesting to introduce the relative dynamic 
range, defined as the ratio between the true fiux density of 
a source and the difference between the true and the recov- 
ered flux density (Pindor et al. 2010). This is another way of 
estimating the residual contamination due to an imperfect 
subtraction. Studies in the literature indicated that bright 
sources should be subtracted down to the 100-10 mjy level 
in order not to affect the subtraction of fainter foreground 
sources and, ultimately, the recovery of the EoR signal (Bow- 
man et al. 2009, Liu et al. 2009b). 

Figure [IT] displays the relative dynamic range for source 
subtraction in the 512T simulation. It can be seen that, with 
the level of noise present in our simulated image, ~92% of 
the sources are above the 100 mJy threshold. 

Since we are fitting all the sources simultaneously and 
iteratively, the main limitation to the relative dynamic range 
comes from thermal noise rather than sidelobe contamina- 
tion. Given the behaviour shown in Figure [151 we expect the 
dynamic range to increase if we consider a longer integration 
where the thermal noise decreases. Section [3.41 will confirm 
this statement. 



3.4 Out of beam sources 

An image has the limitation of excluding all the sources 
outside the image itself (out-of-beam sources). If visibility 
data were accessible, the information corresponding to out- 
of-beam sources would still be accessible and they could 
be subtracted in a traditional selfcalibration-deconvolution 
loop. Once the image is generated and visibility data dis- 
carded, information about out-of-beam sources is lost, apart 
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from the sidelobes, which will still contaminate the image if 
they are bright enough. 

We investigated how well our method subtracts out-of- 
beam sources by minimizing their sidelobe contribution to 
the image, i.e., by fitting the sidelobe pattern of a source, 
regardless of being able to image the source itself. 

In order to make sure that the out-of-beam source 
sidelobes have good SNR, we integrated individual 8 sec 
512T snapshots up to 10 min in a 40 kHz channel (Fig- 
ure I18|l . The thermal noise in our 10 min simulated image 
is 2.8 mjy beam~^. 

In order to reduce the computational load, we included 
only the brightest seven sources used in the 512T simulation, 
therefore the faintest source is ~6 Jy. Two sources were dis- 
placed from their previous position and moved two degrees 
outside the edge of the image. The out-of-beam sources had 
~14.1 Jy and ~12.7 Jy flux densities respectively and we as- 
sumed that an initial estimate of their parameters is know 
from a pre-existing source catalogue. 

The subtraction was performed according to the follow- 
ing steps: 

(i) an initial parameter estimate of the five sources within 
the field of view was computed and the sources were sub- 
tracted ignoring the out-of-beam sources (Figure [19)) : 

(ii) the two out-of-beam sources were included in the sky 
model and a joint parameter estimate performed. The best 
fit model is subtracted from the image (Figure EH)) : 

The final image after the whole sky model was sub- 
tracted is consistent with the thermal noise level and its 
power spectrum agrees with the noise power spectrum at 
all angular scales (Figure I^Tj) . It is important to note that 
power spectrum of the residual image after removing only 
the sources within the field of view is still well above the ex- 
pected noise power spectrum. The subtraction of the side- 
lobe pattern of the out-of-beam sources improves the dy- 
namic range by a further factor of ~5. 

The plot of the relative dynamic range (Figure [22)) con- 
firms the results of Section [3.31 The relative dynamic range 
of the sources inside the field of view has improved by ~2 or- 
ders of magnitude by longer integration and the two sources 



Figure 18. Simulated image with the 10 min 512T uv coverage. 
Five sources are within the field of view and two outside. The 
color scale runs between -0.3 and 1 Jy beam~^. 




Figure 19. Residual image with only the five in-beam sources 
subtracted, after five iterations. The color scale runs between -0.2 
and 0.2 Jy beam~^. The five sources within the field of view were 
well removed revealing the sidelobe pattern of the unsubtracted 
out-of-beam sources. 



with the worst dynamic range are the out-of-beam sources, 
which have a poorer SNRs. All the sources within the field 
of view are now above the 10 mJy threshold. 
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Figure 20. Residual image where all the sources were subtracted. 
Five iterations were performed. The color scale runs between - 
0.03 and 0.03 Jy beam~^. The sidelobe pattern has been removed 
down to the thermal noise level. 
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Figure 21. Power spectra of residual images for an increasing 
number of subtracted sources. From top to bottom: initial im- 
age without subtracting any source (Figure [TSt . after the sources 
within the field of view were subtracted (Figure [TgJ, after all the 
sources (in and out-of-beam) were subtracted (Figure [20t . The er- 
ror bars are at the la confidence level. The dashed line represents 
the noise power spectrum. 



3.5 Computational costs 

In our simulations we have shown that forward modeling 
can achieve a high level of precision in source subtraction. 
This comes, however, with a significant computational cost. 
Up to the number of sources that we have considered in 
our 512T simulation, the greatest computational load comes 
from imaging rather than generating the visibilities or fit- 
ting for the astrophysical parameters. In the case of an 
8 sec snapshot image with a 20° x 20° field of view we esti- 




Source flux density (Jy) 

Figure 22. As in Figure 1171 but for the out-of-beam sources. 
The two sources with the poorest relative dynamic range are the 
out-of-beam sources for which the SNR is lower. 

mated that ~70 Gflops are required to generate each image, 
therefore each iteration of the subtraction scheme requires 
~200 Gflops for every source that has to be subtracted. It 
takes ~40 sec on a normal Dell 2.4 GHz 2 quad Core ma- 
chine. 

Although such a computational need might require a 
very long processing time - particularly for long integra- 
tions -, there are several ways of shortening the processing 
time length. The most straightforward way is to implement 
a Graphics Processing Units (GPU) pipeline in the most 
computationally intensive part of the process, i.e. imaging. 
By running our simulations on a GPU, a factor ~5 in time 
is gained. 

The second shortcut is to parallelize the forward mod- 
eling loop. Although the simulations presented in this work 
were performed in serial, the calculation of the forward 
model and the partial derivatives can be run in parallel, 
potentially on a dedicated GPU machine. 

Finally, it is important to notice that such a high level 
of precision in source removal might be superfluous for very 
faint sources for which calibration errors are larger. In this 
case, convenient approximations in fitting source positions 
(i.e, Pindor et al. 2010) will speed up the calculations and 
might eventually give the same level of accuracy in the sub- 
traction. 



4 CONCLUSIONS 

We have presented a point source deconvolution technique 
that makes use of forward modeling and an algebraic non- 
linear minimization scheme. The main motivation for this 
implementation was achieving high dynamic range images 
in the absence of visibility data. Current (MWA) and future 
(SKA) radio interferometers require such a huge number of 
elements that they are being forced to rely more and more 
on real-time calibration and imaging, without the use of tra- 
ditional selfcalibration techniques. 

The basic idea of our scheme is to forward model the 
sky brightness, i.e., to filter the sky model through the same 
instrumental response that is applied to the data. In the case 
of radio point sources, the forward model is the synthesized 
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beam which is generated for each source individually. In this 
way, position dependent variations of the synthesized beam 
are accounted for. 

Point source astrophysical parameters are recovered 
through a nonlinear minimization over the image pixels. 
In this way we overcome the known dynamic range limi- 
tations of image-based deconvolution due to pixelization ef- 
fects. Since the presented technique minimizes all the sources 
simultaneously and in an iterative way, it is minimally sen- 
sitive to sidelobe noise and essentially limited by thermal 
noise. 

It is worth noticing that this method can be applied 
to different sky components and can incorporate calibration 
parameters such as ionospheric displacements and primary 
beam shapes measured from the actual data. 

The technique was applied to three different simulated 
cases: a 10 min integration with the 32T MWA, an 8 sec 
snapshot image of the 512T MWA, and a 10 min integration 
with the 512T MWA where sources were placed inside and 
outside the field of view. 

In all cases we were able to subtract sources down to the 
thermal noise without assuming an a priori knowledge of the 
sky, with the exception of initializing the position and flux 
density of sources placed outside the field of view. The fi- 
nal residual images are consistent with the expected thermal 
noise on all the angular scales. Errors in the fitted param- 
eters decrease with increasing SNR, in agreement with the 
expected theoretical measurement error distribution. Even 
when sources were not physically present in the images, we 
could subtract their sidelobes down to the thermal noise 
level. 

The 512T simulations are relevant in the light of the 
MWA EoR experiment. Since only a limited number of 
sources can be subtracted in real time, an off-line subtraction 
of the residual sources will have to be performed on the im- 
ages to a high level of accuracy in order to precisely remove 
them and their direction-dependent synthesized beams. 

In the simulation of an 8 sec image with the 512T ar- 
ray, we achieved a dynamic range of ~3400, indicating that 
the subtraction of foreground sources can be improved by 3 
orders of magnitude through this technique. Source param- 
eters can be retrieved with an average error of 10 arcsec on 
positions and 0.15% errors on flux densities. 

The relative dynamic range of our subtraction is limited 
by the thermal noise and is above the 100 mjy threshold 
for 92% of the sources. Since the best fit parameters im- 
prove with the SNR, a lower threshold - i.e. 10 mJy - can 
be reached by lowering the thermal noise through a longer 
integration. In fact, in the 512T 10 min simulation all the 
five sources present in the image had a dynamic range above 
the 10 mJy threshold, indicating that bright sources can be 
subtracted to a level that should not affect the detection of 
the EoR. 

A sky model more realistic than only point sources 
could be forward modeled by modifying the procedure pre- 
sented here. Extended sky emission modeled as a list of delta 
functions (i.e. the equivalent of CLEAN components) could 
be directly treated by the present approach. More sofisti- 
cated modeling of extended emission that uses a set of basis 
functions like, for instance, shapelets (Yatawatta 2010) or a 
principal component analysis (de Oliveira-Costa et al. 2008) 
can be incorporated by convolving the model of the bright- 



ness distribution with the instrumental primary beam and 
then sampling it according to the uv distribution (see Wayth 
et al. 2010 for an example of this approach). 

Future work will investigate these extensions and in- 
clude a more realistic instrument model to better simulate 
the strategies for the EoR detection. 
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